Comparing Recurring Lexico-Syntactic Trees (RLTs) and Ngram Techniques for Extended Phraseology Extraction

نویسندگان

  • Agnès Tutin
  • Olivier Kraif
چکیده

This paper aims at assessing to what extent a syntax-based method (Recurring Lexicosyntactic Trees (RLT) extraction) allows us to extract large phraseological units such as prefabricated routines, e.g. as previously said or as far as we/I know in scientific writing. In order to evaluate this method, we compare it to the classical ngram extraction technique, on a subset of recurring segments including speech verbs in a French corpus of scientific writing. Results show that the RLT extraction technique is far more accurate for extended MWEs such as routines or collocations but performs more poorly for surface phenomena such as syntactic constructions or fully frozen expressions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Lexico-Syntactic Ontology Design Patterns for Ontology Creation and Population

In this paper we discuss the use of information extraction techniques involving lexico-syntactic patterns to generate ontological information from unstructured text and either create a new ontology from scratch or augment an existing ontology with new entities. We refine the patterns using a term extraction tool and some semantic restrictions derived from WordNet and VerbNet, in order to preven...

متن کامل

Freepal: A Large Collection of Deep Lexico-Syntactic Patterns for Relation Extraction

The increasing availability and maturity of both scalable computing architectures and deep syntactic parsers is opening up new possibilities for Relation Extraction (RE) on large corpora of natural language text. In this paper, we present FREEPAL, a resource designed to assist with the creation of relation extractors for more than 5,000 relations defined in the FREEBASE knowledge base (KB). The...

متن کامل

Recognition of Structured Collocations in An Inflective Language

We present a method of the structural collocations extraction for an inflective language (Polish) based on the process divided into two phases: extraction and filtering of the pairs of wordforms reduced to baseforms and structural annotation of the extracted collocations with lexico-syntactic patterns. The parameters of the patterns are specified manually but their instances are generated and t...

متن کامل

Extraction of Semantic Relationships from Academic Papers using Syntactic Patterns

Integrating concept and citation networks on a specific research subject can help researchers focus their own work or use methods described in prior works. In this paper, we propose a method to extract semantic relations from concepts and citation in the descriptions of related work. Specifically, we examined (i) topic-paper relations between research topics and reference papers and (ii) method...

متن کامل

Combining Statistical Techniques and Lexico-syntactic Patterns for Semantic Relations Extraction from Text

We describe here a methodology to combine two different techniques for Semantic Relation Extraction from texts. On the one hand, generic lexicosyntactic patterns are applied to the linguistically analyzed corpus to detect a first set of pairs of co-occurring words, possibly involved in “syntagmatic” relations. On the other hand, a statistical unsupervised association system is used to obtain a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017